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ABSTRACT 

We demonstrate highly accurate recovery of weak gravitational lensing shear using 
an implementation of the Bayesian Fourier Domain (BFD) method proposed by Bern¬ 
stein & Armstrong (2014, BA14), extended to correct for selection biases. The BFD 
formalism is rigorously correct for Nyquist-sampled, background-limited, uncrowded 
image of background galaxies. BFD does not assign shapes to galaxies, instead com¬ 
pressing the pixel data D into a vector of moments M, such that we have an analytic 
expression for the probability P{M\g) of obtaining the observations with gravitational 
lensing distortion g along the line of sight. We implement an algorithm for conducting 
BFD’s integrations over the population of unlensed source galaxies which measures ~ 10 
galaxies/second/core with good scaling properties. Initial tests of this code on ss 10® 
simulated lensed galaxy images recover the simulated shear to a fractional accuracy 
of m = (2.1 ± 0.4) X 10“^, substantially more accurate than has been demonstrated 
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previously for any generally applicable method. Deep sky exposures generate a suffi¬ 
ciently accurate approximation to the noiseless, unlensed galaxy population distribution 
assumed as input to BFD. Potential extensions of the method include simultaneous mea¬ 
surement of magnihcation and shear; multiple-exposure, multi-band observations; and 
joint inference of photometric redshifts and lensing tomography. 

Subject headings: gravitational lensing: weak—methods: data analysis 


1. Introduction 


Weak gravitational lensing (WL) provides an unambiguous measurement of the second (and 
potentially higher) derivatives of the scalar gravitational potential along the line of sight. This has 
made WL a critical observational window into the behavior and history of the components of the 
Universe that source the gravitational potential but do not absorb or emit photons. WL can in 
addition test the laws of gravitation relating the potential to the matter. Several visible/near-IR 
surveys of thousands of square degrees of sky are now underway, with measurement of WL signals 


from images of 0(10°) galaxy images as a primary goal: the Dark Energy Survey (DES) (Jarvis et 


^|2015), the Kilo-Degree Survey (KiDS) (Kuijken et al.||2015), and the Hypersuprime-cam Survey 


(HSC) (Takada 2010). Even more ambitious visible/near-IR WL surveys are planned to measure 
> 10® galaxy images in the 2020’s: the Large Synoptic Survey Telescope the Euclid 


spacecraft (Laureijs et al. 2011),^ and the Wide-Eield Infrared Survey Telescope (WEIRST)^WL 


Brown 

2015 

) and of the cosmic microwave background radiation (van Engelen et al. 

2012 

Das et 

al. 

2014 

Planck Collaboration 

2015 

Giannantonio et al.||2015). 


WL signals are known to be difficult to extract from sky images. Along nearly all lines of 
sight, the dominant manifestations of lensing are a magnification p, and a shear ( 51 , 52 ) of the 
image that dehne a local rotation-free linear transformation of the sky image. In this paper we 
will concentrate on estimating these parameters from real-space images of galaxies (with some 


mention of interferometric imaging in Section 6.1). The difficulties include: the signal is weak, with 
magnification and shear having RMS amplitudes of ~ 0.02 on cosmological lines of sight; the source 
galaxies are typically of comparable intrinsic size to the point spread function (PSF) of the imaging, 
and the PSF has asymmetries and variation larger than the lensing signal; the WL information is 
primarily in galaxies with modest signal-to-noise ratios {S/N < 25); and the light distribution from 
galaxies, unlike the CMB, is not described by any known statistical process. 


'http://www.Isst.org 
^http://www.euclid-ec.org 
^http://wfirst.gsfc.nasa.gov 
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A series of community-wide blind “challenges” in extracting shear distortion applied to simu¬ 
lated sky images provides a good summary of progress in overcome these obstacles, most recently 
the GREATS challenge ( Mandelbaum et al.|2015 ). A common quantification of the accuracy of infer¬ 
ence is to compare a measured shear component graeas to the value gtrue inserted into the simulation 
by 

ffmeas fftrue — ^5true A C, (1) 


where m is the multiplicative or calibration error on the shear, and c is spurious signal uncorrelated 
with the input shear, e.g. due to leakage of PSF asymmetries into ^meas- The ambitious next- 
generation surveys require \m\ < 10“^ and crms ^ 10 “^'® in order to keep shear-measurement errors 
from degrading the accuracy of cosmological inferences (jHuterer et al. 2006 |Amara Sz Refregier 


2008). The GREATS simulated galaxy samples are large enough to measure m to ±0.003 and c 
to ±10“^ at 68% confidence. Over 1000 measurements were submitted to GREATS using ~ 20 
distinct methodologies. None were consistently able to achieve m = c = 0 at this level of accuracy, 
though the sFIT algorithm (see Mandelbaum et al. 2015) did so in more than one branch of the 


challenge. There has not to date been any demonstration of a practical shear inference method 
that is accurate at the part-per-thousand level we will soon want. In this paper we show that 
the Bayesian Fourier Domain (BFD) method proposed by Bernstein & Armstrong ( 2014[ BA14) 
is in this regime, validating the method on simulations similar to some branches of GREATS. Our 
validation tests are more demanding than GREATS in the sense that we include lowei-S/N source 
images and hence require some correction for selection biases]^ 


Most of the current effort on shear measurement is directed towards model-fitting methods, 
whereby a parametric model of each galaxy’s appearance is convolved with the PSF and compared 
to the pixel data. The models are usually concentric combinations of exponential and deVaucouleurs 
ellipsoids (or other Sersic profiles). Galaxies are assigned the ellipticity of the model which has 
the maximum likelihood of reproducing the pixel data, or an ellipticity is assigned from some 
weighting of the likelihood surface. Model-htting methods can of course accrue biases because 


real galaxies are not fully described by the model |Voigt & Bridle 

2010 

Bernstein 

2010 

). Other 

methods {e.g. 

Kaiser et al. 

[TM5 

Bernstein Sz Jarvis||2002 

Bernstein||2010 

assign an ellipticity via 


model-independent means, typically as some combination of weighted central moments. In both 
approaches, a shear estimator gmeas is produced from some weighted sum of the galaxy ellipticities. 
While these methods overcome many of the WL inference difficulties noted above, none includes a 
rigorous treatment of the propagation of image noise through the measurement, and hence incur 
“noise biases,” e.g. because the maximum-likelihood parameters respond asymmetrically to noise, 
hence can accrue bias in the presence of noise. Furthermore all these methods are subject to 
selection biases, as the criteria for inclusion and/or weighting of galaxies’ ellipticities implicitly 
favor certain pre-lensing orientations of the sources. Recovery of unbiased WL estimators then 


“^We have not applied BFD to the public GREATS challenge data. The underlying galaxy population for the GREATS 
images was chosen to depend on the observing conditions, hence violating BFD’s basic assumption that deep images 
are sampling the same population as the target images. 
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relies upon applying corrections derived empirically from application of the method to simulated 
images with known lensing distortion {e.g. Gruen et al.||2010 Kacprzak et al.|[2012 ). The accuracy 
of such corrections thus depends upon the simulated images capturing all salient characteristics of 
the real sky and the observing process. 


The largest recent WL cosmology surveys have adopted variants of model-htting. The CFHTLS 
and KiDS surveys use the lensfit code, with empirical corrections averaging m ~ 0.05 but as large 
as m ~ 0.20 applied as described in Miller et al. (2013). The DES Science Verihcation WL results 
( Jarvis et al.j2015 ) use two parallel codes: NGMix ( ]Sheldonp014 ) and im3shape ( ]Zuntz et ah , 2013), 
the latter with empirical corrections for noise and selection bias which again range up to ~ 0.2. 


Jarvis et al. (2015) conclude from a battery of internal tests and inter-comparisons of the methods 


that an uncertainty of ±0.05 should be assigned to m in the final shear catalog. This accuracy is 
shown to be sufficient for the preliminary results from DES, but this and other surveys in progress 
will soon require WL inferences with ~ lOx better accuracy. Further, it is disconcerting that the 
simulation-derived corrections are ~ 50x larger than the accuracy needed for next-generation WL 
projects. 


BA14 propose a different approach to shear inference. They suggest skipping the estimation 
of galaxy shape properties, instead evaluating from the outset the probability P{Di\g) that the 
pixel data Di from galaxy i would be produced for lensing g on its line of sight. While a single 
galaxy provides only weak discrimination on the shear in that the derivative VgP is small, a 
large population of sources can tightly constrain the mean g, or any model for spatially varying 
WL. The BA14 method relies upon having high-S'/A^ observations of a representative sample of 
source galaxies, which can be obtained from observations of a subset of the survey region with 
longer integration times. In other words the model for the unlensed population of target galaxies 
is that it is drawn from the finite number of galaxies observed in these deep-integration fields—or 
more specifically, that the targets’ moments are drawn from those of the deep-held galaxies, up to 
rotation and parity transformations (as explained below). 

In Section we re-derive the BA14 BFD method, extending the method to include a pre¬ 
scription for detection and selection of sources for which selection biases can be calculated to high 
accuracy. Section describes our computational implementation of the BFD method, and in Sec¬ 
tion 1^ we demonstrate using simulations similar to GREAt3 that our implementation does indeed 
recover input shear near part-per-thousand accuracy, potentially unlocking the full power of current 
and future lensing surveys. 

In Section we take inventory of the assumptions and approximations made in deriving the 
BFD estimators, and assess the extent to which violations of these in real data might compromise 
the measurement accuracy. We hnd no show-stoppers yet. In Section we sketch a number of 
straightforward extensions of the current BFD implementation that will extend its science reach, e.g. 
to measurements of magnihcation as well as shear, to multi-band or interferometric measurements, 
and to integrating photometric redshift and tomographic WL inference into a single measurement 
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process. 


2. Formalism 


Our goal is to infer the lensing distortion g from the observational data vector D. Our current 
implementation assumes pure-shear distortions, so g = (< 71 , 52 )) but the formalism is unchanged if 
we include magnification g m. g as well. By Bayes’ theorem 

P{D\g)P{g) 

P{D) 

We will not be concerned with the normalization by the evidence P{D). In this paper we will 
assume that all galaxies are viewed through a common lensing g, and that the prior P{g) is much 
less informative than the data and can be taken as uniform. Thus we focus on determining P{D\g). 
We leave to future work the extension of the method to other circumstances, such as when g is 
known to follow some parametric form of position (discussed in BA14), or when g is drawn from a 
Gaussian random field. 



2.1. Simplest case 

Ultimately we will need to determine P{D\g) in the case where the data contain images of an 
arbitrary number of galaxies at unknown locations. We will assume that the pre-seeing, pre-lensing 
images are drawn from a known library of “template” galaxies, indexed by G, which in practice we 
will obtain by observing a fraction of our survey to significantly higher S/N. We begin, however, 
with a simple case, in which we know we are observing a single galaxy known to have underlying 
template index G. The position on the sky of some reference point in the galaxy (such as its 
centroid) we denote as xq- Knowing G we simulate the action of the lensing distortions and the 
observing process (namely the PSF and pixelization) to predict the data vector D^{g, xq) that we 
would obtain from a noiseless observation. The observed data vector is 

D = D^ + D^{g,XG), (3) 

and we assume that we know the likelihood function C{D'^) of the added noise. In the case that 
xg is known, we have 

P{D\g) = C{D^) = C[D- D^{g, ®g)] • (4) 

A central strategy of BFD is to compress the pixel data to a short vector M that carries 
most of the information about lensing distortion. The critical requirement on the compression is 
that we are able to propagate the distribution of into a probability P(Af|AT^) of observing 
compressed data M given that the noiseless underlying galaxy image compresses to . This is 
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most straightforwardly accomplished by having the compression be a linear operation on D such 
that Equation Q becomes 

M = M'^ + M^{g,XG), (5) 

and we will have, for fixed xq, 

P{M\g) = C{M'^) = C[M-M^{g,XG)\. (6) 


We choose for M a set of moments of the Fourier transform I°[k) of the observed surface 
brightness I°{x), defined as 


F{k-,xo)= [ (fxI°{x)e-^^'^^-^°^ 


( 7 ) 


Note that this compression requires a choice xq of coordinate origin. In this section we will assume 
that we have a priori knowledge of xg and can set xq = xg- In the next section we will develop a 
treatment for the case of unknown xg- The data D are a regular sampling of 1°, so in practice the 


Fourier transforms are 

discrete. We choose the compressed data vector 



/ Mf \ 


( 1 \ 

M{xo) = 

Mr 

M+ 


kl + kl 

U2 1.2 


V Mx yi 


^ ‘Ihxky J 


where T[k) is the Fourier transform of the PSF that has convolved the observed image. iy(|fc^|) is 
a real-valued window function applied to the integral to bound the noise, in particular confining the 
integral to the finite region of k in which T(k) is non-zero. We calculate the moments in Fourier 
domain in order to simplify the removal of the effects of the PSF, but these moments are equivalent 
to taking radially weighted zeroth and second moments of the real-space, pre-seeing image of the 
galaxy. 


The moments are not normalized, so that M remains a linear function of D. The noise 
moment vector, being a sum over the statistically independent noise of many pixels, will have a 
likelihood £(iW”) rapidly tend toward a multivariate Gaussian with covariance matrix Cm- We 
assume that the pixel noise is stationary, in which case there is no covariance between the noise 
at distinct k values, and the covariance matrix elements are related to the power spectrum Pn{k) 
of the noise by 



(9) 


Note that while background shot noise and detector read noise are stationary, any significant shot 
noise from the galaxy’s photons will violate stationarity. With sensible choice of W, the moments 
M carry most of the information available about shear of the source (Bernstein & Jarvis 2002). 


There are many practical benefits to discarding the rest of the information in D, as will become 
apparent, but we highlight first that is independent of the observational conditions, i.e. has 













been corrected for the PSF, so we do not need to recalculate as the PSF varies, as long as we 
hold W fixed. 

We note at this point that there is much freedom in the choice of VF. As long as W leads 
to finite noise Cm via Equation (§, BFD remains valid and unbiased; but the accuracy of the 
inference on g will depend upon the choice of W. One may choose to adjust this weight to optimize 
shear inference for a given set of observing conditions, but it is critical that the choice does not 
depend upon the properties of target galaxies. In this sense the BFD method is sub-optimal, in that 
it may not extract the most precise measure of g from both large and small galaxies simultaneously. 
We describe one choice for W in Section! 


The next assumption in BFD is that the tensing is weak, so that a second-order Taylor expan¬ 
sion about g = 0 fully describes P{M\g) for observed values of g. In this case we have 


P{M\g) = P-^Qg + ^gRg, 


( 10 ) 


P = P{M\g = 0) = C{M-M^) = |27rCMr^/^exp[-(M-M^)^Cj(/(M-M^)] , 


Q = VgP{M\g)\g^^ = -VgM^ • VmC (M - M«) 

VgVgP{M\g)\g^^. 


( 11 ) 

( 12 ) 

(13) 


At the end of Equation (12) we have assumed that the noise likelihood is invariant under shear 


of the underlying galaxy G so that we can propagate all shear derivatives into derivatives of the 
properties of the template galaxy. This is satisfied for background-limited images. The quantities 
Q and R give the differential probability of observing the image under lensing distortions. If D is 
comprised of many independent observations Di of the same underlying galaxy G with the same 
applied lensing, we can produce the quantities Pj,Qj,Rj as above for each observation, then the 
total posterior probability for g is given by 


-\nP{g\D) = (const) — InP(gi) — ^lnP(A|9) 


(14) 


= (const) - InP(gi) - + 1^9 ' Rtot • 9, 

Q^ 


<3.01 = E ^ 

^ i 


R 


■tot 


E 


QiQi 

P? 


R? 


(15) 

(16) 

(17) 


The posterior distribution is, ignoring the prior, Gaussian in g, with inverse covariance matrix 


and mean value 


Cg = Rt'oi 


9 = RjQtof 


(18) 

(19) 




2.2. Detection and selection 


Consider now the case where there is a galaxy present but we do not know its position xq in 
advance, so we need a prescription for choosing those locations xq about which we will compute 
moments. We need a detection process to decide if the galaxy has been observed within some small 
region about some position xq. Once a detection is made, we will also require some selection 
criteria to decide which detections will be used to constrain the lensing. In this section, we will 
continue to assume that the unlensed appearance G of the galaxy is known, but its location is not. 
At each potential source location, we end up with either a successful detection and selection, plus 
measured moments Af; or a non-selection. We therefore need to know P{M, s, d\G) = P{M, s|G) 
for the former case, and 1 — P(s|G) for the latter case, where s (d) indicates successful selection 
(detection). 

These probabilities are readily calculable if we make the detection and selection using the 
compressed quantities themselves. We add to our compressed data set the two weighted first 
moments of the source in Fourier space; 

A (■“:). (20, 

We choose as a criterion for detection of a source at xq that JA(a;o) = 0. Our choice of moments 
for M and X have these useful properties: 



dMf 

dxo 


(21) 

Cov{M,X] 

1 = 0 (for stationary noise), 

(22) 

J = 

dX 

dxQ 

= (M/ - Ml- Ml) /A = M^BM, 

(23) 


B = diag(0,1/4, -1/4, -1/4), 

(24) 


(J- 

1 = Tr(BCM) = 0, 

(25) 


The first line means that we detect a source at all stationary points of the function /(xq) = Tfj, 
the zeroth moment of the image as convolved with a filter defined by W{\k‘^\)/T{k). This filter 
will be broader than the PSF in any sensible application of BFD. The second property yields 
£(Af”,X"') = for our multivariate Gaussian noise distribution. The third property 

shows that the Jacobian determinant J of the positional moments X is purely a function of M, 
and hence statistically independent of X. 

The noiseless moments expected from galaxy G are now functions D^{g,u = xq — xq) and 
X^{g,u) since it is only the displacement u between the galaxy location and the Fourier phase 
center that matters. The detection condition is JA = X^ + X'^ = 0 so the probability of this 
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occurring in a small region A^x about xq is 

P{M, d\G, g, xg, xq) = C[M - M^{g, xq - ®o)] P [-X^{g, xq - * 0 )] 


dX 


dxo 

= C[M - M^{g,XG - aJo)] P - ®o)] 


P{xo) A^x (26) 
(27) 


In the last line, we take advantage of (23), assume a uniform prior P{xq) on the position of the 
detection, and note that the zero-mean, multivariate Gaussian will have C{—X'^) = C{X^). 

To eliminate noise detections, we will want to discard low-flux detections. We implement the 
selection criterion as membership in a subregion S of moment space: 


0 


P{M,s\G,g,XG,xo) = 

P{s\G,g,u = xg- xq) = £(X‘^)A^x [ 


s :/min < / < /max 

' C[M - M^)C[X^)\J\I\^x M€S 


(28) 

(29) 


M is 

dM C{M - M^)\J{M)\. (30) 

Mes 

For brevity we suppress the dependence of the template galaxy’s moments , X^ on the applied 
leasing g and on the displacement xq — xq between the galaxy position and the detection location. 


To render the integration in (30) tractable, we make the simplifying assumption that the 


Jacobian determinant J of the first moments is positive at any location where there is non-negligible 
probability of selection: 


J = M^BM = J'^ + 2 (M *^)^> 0. 


(31) 


Since J is the determinant of the 2nd derivative matrix of /, a restatement is that we are assuming 
the /(xq) surface is (nearly) always convex if /min < / < /max- To maintain this approximation we 
will need to avoid noise detections by raising /min ^ Sctj-, where we define 


cr'j — {Cm) ff 


(32) 


We discuss this convex-detection approximation in Section 5.3 With this approximation, we 


can integrate a multivariate Gaussian £ in Equation (30) analytically, obtaining 


BY BJY 

P{s\G,g,u) = C{X^)J^^x J^Y+ 2 {Cm^M^), — + {CmBC_ ' 


^ dfc 




dff 


G J 


/‘(/max fG)/^f o , 

Y={2Tr)-^/^ dve-'^ 

d { f min- fa) 


(33) 

(34) 


Now consider the joint distribution of the detection/selection outcomes at a grid xi,X 2 , ■ ■ ■ ,Xj .. 
of all search positions with non-negligible selection probability P{s\G,xg,xq = Xj). We assume 
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now that galaxies are uncrowded, in that no other galaxies contribute significantly to M or X at 
any location xj where galaxy G might he selected. At each search position, we either have a selection 
and a resultant M, or we have a non-selection. If the search region is contiguous, there can be at 
most one of the Xj with successful selection. This follows from our assumption that J > 0, which 
implies that that map xq ^ X is one-to-one over a contiguous region, so that X = 0 can only 
occur at a single xq. 

With this single-selection rule, we have two possible outcomes: 


1. A detection at a single location Xj yielding moments M, with probability P(Af,Sj|G) from 
Equation (29), or 

2. No detection at all, with probability 1 — using the selection probability in Equa¬ 

tion (33). 


Integrating over all possible detection positions, we obtain a total probability of outcome (I): 

P{M,s\G,g,XG) = J{M) J d\C[X^{g,u)]C[M-M^{g,u)]. (35) 

pa J(M) C [X^{g, u)] C[M- M^{g, u)] . (36) 

u 

In the second line, we change the integration to a sum over a 2d grid of points u with cell area 
A^m, since this is how we implement the integration over source position. We can truncate the 
grid where P{sj\G) becomes negligible. As expected, the resulting probabilities are independent 
of both the true position xq of the galaxy and the position Xi of the detection once the observed 
moments M are specified. 


The total probability of detection is obtained by similarly integrating Equation (33) over all 


u: 


P{s\G,g) = 


( ffY 

,g) = ^A2 u£(X^) |j(M^)y + 2(CMBM«)^ —+ (CmBCm)// 


d‘^Y 


G 


(37) 


remembering that , X^, fc, and the arguments to Y depend upon g and u. For a galaxy with 
flux fa that is many uj away from the selection boundaries, we have T —)• I. In this case it is 
easy to see that P{s\G,g) —)■ I, by recasting (37) as an integral over X^ —as long as J > 0. If the 


positive-J assumption does not hold, Equation (37) is incorrect, and we can have a mean number 


of detections per source that is > I. In Section [5^ we discuss our approach to mitigating failure 
of the positive-J assumption. 


2.3. Galaxy populations: postage stamp case 

Now we generalize from having a single galaxy type G to having G be an index into the entire 
catalog of possible galaxy images. We assume we know the prior probability pc that a galaxy is 
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of type G. If for example the galaxy library is approximated by the set of galaxies found in a 
high-5/A^ imaging survey of the sky, each detected galaxy would be assigned equal pc- 

First consider the artihcial case (commonly used in shear-testing programs) in which we know 
that exactly one galaxy has been placed in each of many disjoint “postage stamps” of pixels Di G D. 
In each stamp, we either obtain a selection with measurement of moments Mi at some location in 
the stamp, or we obtain a non-selection. The probabilities of these two outcomes are 


P(M„ s|g) = J{Mi) Y,PG^^uC{X^)C{Mi - M^), 

G,U 

-P(~ sis') = 1 - P{s\g) 

r BY B^Y 

P{s\g) = Y,pg^^uC{X^) J{M^)Y + 2 (CmBM«)/— + (CmBCm)// 

G,U L JG Jg- 


(38) 

(39) 

(40) 


These are the key equations for the BFD calculation. We have made implicit the dependence of the 
noiseless template moments M^ and X^ on the source position u and the lensing g. We define 
as before the Taylor expansions 


P{Mi, s|gi) ^ Pi + Qi ■ g + 


P{s\g)- Ps + Qs-g + 


1 „ 

2^ • Bi • g^ 

• Bs • g, 


(41) 

(42) 


where Qi = XgP{Mi, s), etc., are derived by propagating derivatives through to template quanti¬ 
ties M^ and X^. The detection probability P{s) is integrated over all possible selected moments 
and all possible galaxies G, so it does not depend on the data in stamp i, only on the noise level and 
PSF of the observation as manifested in the covariance matrix Cm in each stamp. For notational 
simplicity we will assume here that all stamps have the same noise level and PSF and hence the 
same Cm, but the formalism and our implementation allow for variation between targets. 


The combined probability of the output of the observation/detection/selection/compression 
process is 

PiD\g) = P{r. s\g)^-^ n P{M„s\g) (43) 

iGselections 


where Nns is the number of non-selected stamps, 
lensing variables, following Equation (15): 


We can now calculate the probability of the 


-In P{g\D) = {const) - lnP{g) - g ■ + -g ■ Ktot ■ g, 


Btot 




+ Xns 


QsQs I B^ \ 

(1-P,)2 ^1-Pj 


(44) 

(45) 


(46) 
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We now have all the tools needed to make a leasing inference from a postage-stamp data set. 
We assume that we have available a complete catalog of possible galaxies G and that for each we 
have a noiseless, unlensed image. In practice of course our template set will be a hnite sample 
from the (infinite) distribution of detectable galaxies. It is essential that the template set is a fair 
sample of all galaxy types that can meet the selection criteria with non-negligible probability. In 
other words we must know about galaxies that are outside the flux selection cuts by up to several 

The input data are: postage stamps of the “observed” galaxies, which we call the targets; 
low-noise postage stamp images of unlensed template galaxies to serve as our sample G; the PSF 
for each stamp; and the noise power spectrum for each stamp. Our testing assumes white noise, 
Pn = n. 

The procedure is as follows: 


1. Select a weight function W that will be applied to all targets and templates. The best 
choice will usually be a rotationally symmetric approximation to T{k)‘^Ig{k), where Ig is the 
transform of the unlensed, pre-seeing image of a galaxy of typical size in the survey. 

2. For each template galaxy G, measure the moments and under W for copies of 
the galaxy translated over a grid of xq centered on the primary flux peak. We can purge 
from the template set any that have negligible P(s|G). Further calculate the hrst and second 
derivatives of all moments with respect to g, using the formulae in Appendix [C| 

3. For each target galaxy: 


(a) Find the point(s) near the object centroid where the detection criterion X = 0 is met. 

(b) Calculate the moments Mi about the detection point(s) and discard those failing the 
selection cut on the flux moment. After this step we require no further aecess to the 
image data. 

(c) If no selection is made, increment the count Nns of non-selections, and continue with 
the next stamp. If more than one selection is made, choose the brightest and note that 
we have violated one of our assumptions! 


(d) 

(e) 


Calculate Cm for this stamp. 


For each target postage stamp i, calculate Pi = P{Mi, s\g = 0) from Equation (38), and 
also the derivatives under lensing Qi and R,;. Since this operation is executed for every 
target-template pair, it is the computational bottleneck of the procedure. The summand 
in (38) is simple, involving some 4-dimensional matrix algebra and one exponential, so 
is far faster than an iteration of a forward-modeling procedure. The {Pi,Qi,Ili} data 
fully encapsulate the lensing information from this galaxy and go into our catalog. 


4. Calculate the selection probability P{s\g = 0) from Equation (40), and its derivatives Qg, 
with respect to lensing. Note this needs to be done only once for each distinct Cm- 
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5. Sum the contributions to the posterior — hiP{g\D) from detections and non-detections as 
given in Equations (45) and (46). 

6. Add the Taylor expansion of any prior P[g) to Rtot- 

7. We now have the posterior log probability for g. The shear estimate and its variance are in 
Equations (19) and (18). 


2.4. Poisson-distributed galaxies 


Eor real sky images, we replace the postage-stamp distribution of galaxies with a Poisson 
distribution. We assume a total unlensed density n of sources on the sky, with probabilities pc 
of each galaxy being of type G. If our target survey spans solid angle H of sky, consider dividing 
this area up into regions of area AH larger than the selection region of any single galaxy, but 
small enough that nAH <C 1 so that we only have 0 or 1 galaxy in the region after running the 
detection/selection/compression process across the survey. The probability of obtaining a detection 
with moments iWj within any small sky area AH is 

P{Mi, s\g, AH) = ^ P{Mi, s\g, G)P(G| AH) (47) 

G 

= nAHP(Mi,s|g), (48) 


where we take P{Mi, s\g) from Equation (38). Similarly, the probability of selecting a source in a 
single cell 


P(s|si, AH) = raAHy~]pGP(s|G,ff), 
G 

= n AH P{s\g), 


(49) 

(50) 


where we use P{s\g) from Equation (40). The quantity nP{s\g) is the expected sky density of 
selected galaxies. It depends on g through the moments of the template galaxies, as per usual. 


Our total data D are reduced to a list {iWj,®*} for 1 < i < W of the locations and moments 
of the Ns selected sources; plus the information that there are no selections at any other locations. 
The total posterior for g is now 

Ns 

P{g\D)^P{g) [l-P(s|g,AH)] P(M„ AH) (51) 

non—detections 2=1 

Ns 

= J] P{Mi, s\g). (52) 

2 = 1 

The (AH)-^® term is independent of g and can be dropped. We retain dependence on n since we 
may wish to consider the source density as a free parameter along with g if we are simultaneously 








14 - 


constraining source clustering and shear. This posterior differs from the postage-stamp case only 
in the non-selection term. We replace (45) and (46) with 

- \iiP{g\D) = (const) - In P{g) - NslognP nQ.Ps - g ■ Qtot + ^9 ' R-tot • g, (53) 


Qtot — 
R-tot = 




QtQ 


Ri 


p2 P nflRs 


(54) 

(55) 


The operative procedure for inferring shear from a sky image is hence identical to that given 
for the postage-stamp case, except that of course we search the entire image for detections, not 
just the centers of each stamp. We use the above formulae in step instead of the postage-stamp 
formulae. 


2.5. Sampling the template space 


The BFD method depends upon approximating the full galaxy population with a finite sample 
of galaxies G from the sky. In essence we are approximating the continuous distribution of galaxies 
in the moment space with a set of Nq 5 functions at a random sampling from the distribution. 
The measurement error distribution C{M — M^) acts as a smoothing kernel over the samples. 
While the sums over G for Pi (and Qi, Rj) in Equation (38) are unbiased estimates of the complete 


integrals over moment space, there are two issues we must address. 

First, in producing Qtot and Rtot we divide Q* and R* by Pi. As noted in BA14, division by 
a noisy estimator for Pi produces a bias that scales inversely with the number of template galaxies 
contributing significantly to the Pi sums. The number of galaxies we can measure at sufficiently 
high S/N to use as templates will be limited by scarce observing time. Fortunately we can increase 
the density of templates in moment space by exploiting the rotation and parity symmetry of the 
unlensed sky: for each G that we observe, we can assume that rotated and reflected copies of this 
galaxy are also equally likely to exist. In practice we partition pc among such copies and add 


them to the template set. We will investigate in Section 5.7 the bias resulting from finite template 
sampling. 


Second: because our M consists of un-normalized moments, the spacing between template 
galaxies in moment space will become large compared to the measurement error ellipsoid described 
by Cm when we observe target galaxies at high S/N. Bright targets can easily end up with no 
templates for which C{M — M^) is non-negligible. Even worse, the Pi sum for a galaxy can be 
dominated by a single template that is many a away from the target in moment space, and this 
produces large derivatives in \nP{Mi, s) with respect to g, giving spuriously large influence in the 
final lensing estimator. It is further true that brighter galaxies are rarer on the sky, so our template 
survey will contain fewer sources with flux comparable to our brighter targets. 
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It is therefore advantageous to add noise to the moments measured for bright galaxies. One may 
question the sanity of adding noise to hard-won signal, but note that weak shear (magnification) 
measurements accrue uncertainty from the intrinsic variation of galaxy shapes (sizes) as well as 
from the measurement noise in these quantities. Typically, once S/N > 20, the intrinsic variation 
of the population is the dominant form of noise. So a resolved galaxy with S/N ~ 75 loses little 
lensing information if degraded to S/N ~ 25. However if we triple the noise, the likelihood function 
will “touch” 3^x more template galaxies in our 4-dimensional M space, so we can reduce template 
sample variance and bias by increasing noise. 

We must be careful to implement this process such that P{M,s\G,g) remains calculable for 
both the bright galaxies and faint ones. Again this is best done by using the moments themselves 
to decide whether to add additional noise. The procedure that we use is as follows; in Appendix [A| 
we present the altered formulae for P{M,s\G,g) that apply to the galaxies which have had noise 
added. 


1. We establish bounds fi and /2 on the galaxies to which we wish to add noise, based on 
comparing the density of templates with the covariance matrix Cm of the measured moments. 


2. We detect, measure, and select target galaxies the same way as described in Section [273 
the flux range /i < / < / 2 - 


m 


3. For each selected galaxy, we form a new moment vector M = M + Ma, with Ma drawn 
from a multivariate Gaussian with zero mean and predetermined covariance matrix Ca- We 
make no further use of the original moments M. 


4. We proceed with the analysis as before, with the exception that P{M., s\G, g) from Equa¬ 
tion (A6) is used in place of our previous P{M, s\G,g). Note the probability P{s\g) of galaxy 
selection in Equation (40) remains accurate, since selection is made before adding noise to 
the moments. 


More generally we may define a series of b flux bins by bounds fo, fi, ■ ■ ■, ft, and choose for 
each bin a distinct covariance matrix Ca for the added noise (presumably adding zero noise in 
the lowest-flux bin). For each target galaxy we calculate P{Ni, s\G,g) using the value of Ca 
we have applied. The non-selection term T’(slgf) is calculated using /min = /o, /max = ft- The 
only requirement on the added noise is that it obey the condition Tr(BCA) = 0 which holds for 
stationary noise. 


3. Implementation 

We have implemented the BFD shear inference in C-|—|- code. The computational bottleneck 
of the BFD method is the evaluation of P{M, s|G, g), which must be done for each target-template 





- 16 ~ 


pair. A survey like DBS might detect ~ 10®'^ galaxies, and use ~ 10“^'^ templates, each replicated 
over ~ different translations and rotations, leading to ~ 10^^ evaluations of P{M, s\G, g). 


Substantial speedup is attained if we can rapidly cull the templates to those which make 
significant contributions to the sums for Pi,Q^, and Rj, i.e. eliminate those highly suppressed by 
the Gaussian exponential in Equation (38). In this Section we describe some shortcuts to reduce 
the scale of the problem, and an efficient algorithm for culling the target-template pairs, which 
leads to an implementation that is feasible to run on modest present-day hardware for even the 
largest foreseen surveys. 


3.1. Computational shortcuts 

The target galaxies all have Ai = 0 by definition of the detection criterion, and so we may first 
eliminate any template with small £.{X^), a criterion we use to bound the displacements u at which 
we replicate the templates. Furthermore we have the freedom to rotate the coordinate axes for each 
target by the angle /3 which sets one of the ellipticity moments Mx = 0. We must rotate Cm into 
this frame, and make sure to rotate all the and Ri back to the original coordinate system after 
each is calculated. The unlensed population must be invariant under coordinate rotation, so we 
do not have to rotate the . With this procedure, we can prune the templates to those that 
are within ~ 6cr of Mx = 0. The space of template moments is now bounded to a small 

interval near the origin in 3 of its 6 dimensions. 


3.2. k-d tree algorithm 


In building the prior we need to efficiently identify template galaxies with moments that 
are close, in moment space, to a given target galaxy M. The relevant equation is 


X^^{M - M^) {M - M^) < 


cr„ 


(56) 


We must be careful in choosing Umax so that truncation of the integral does not bias g; but the 
number of sampled template galaxies, and the execution time of the measurement, will scale as 




We choose to store the moments of the template galaxies in a k-d tree (Bentley 1975), which 


partitions the templates into distinct A:-dimensional rectangular nodes that allow for fast lookup 
of points satisfying (56). The /c-d tree is built by assuming a nominal covariance matrix Cat 


that is close enough to the Cm of the targets that the set of templates satisfying (56) with Cat 


includes all those which do for Cm, and not many more. To reduce the number of computations, 
we do a Cholesky decomposition C]^^ = A^A, and rescale the template and target moments to 
N = AM,N^ = AN^. This transformation yields = \N — the Euclidean distance in 

N. The N are used only to isolate the relevant templates, not to calculate the probabilities. 
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We need to replicate each template at a grid in u and rotation angle. The step sizes in 
translation and rotation are chosen such that shifts by ~ (Tstep ^ 1 between each grid point. 
Parity-reversed copies are also made. The probability pc of each template is shared equally between 
its copies. We discard template copies that have no chance of satisfying Equation (56) for any 
selected target galaxy (remembering that all selected targets have X = 0, Mx = 0, and /min < 

< /max)- 


The derivatives of with respect to shear are calculated for all retained templates. If 
all the target galaxies have the same covariance matrix, a number of numerical factors can be 
precomputed so that they do not need to be recalculated for every template/target pair. Note that 
a new template set needs to be constructed, and the k-d tree partition repeated, if the target Cm 
changes by more than ~ 10%. The construction of the template tree scales as NclogNc, where G 
is the number of templates, which is subdominant to the time NtNc for integrating the A/ targets 
over the template set. 


After the tree has been constructed, we find for each target galaxy all the nodes that contain 
template galaxies with < Umax using the nominal Cat- If the number of templates in the retained 
nodes exceeds SAgampie; we randomly subsample a fixed number Agampie of them according to their 
probabilities pc- This keeps us from wasting time calculating huge numbers of template/target 
pairs for targets with large uncertainties, while making full use of the templates that resemble the 
rarer targets. With this list of template/target pairs, we can calculate the P,Q, and R values 
needed. The speed of the integration step now scales as A/Wampie if the number of templates 
becomes large. 


Our implementation executes the integration over templates for ~ 10 galaxies per second 
per core on a general-purpose cluster, for the GalSim simulations below in which each target 
is compared to ss 40, 000 templates. At this speed, a 1000-core cluster could measure 10® target 
galaxies {e.g. the LSST survey) in just 1 day, probably much faster than the subsequent cosmological 
inferences will require. 

While the BED method has no parameters to tune to reduce bias, the sampling/integration 
algorithm has three free parameters—UmaxjUstep; and Wampie—which trade computational speed 
and memory requirements against the bias induced by finite sampling. The number Nq of templates 
sampled from the sky also will be important in controlling finite-sample biases. 


3.3. Weights and PSFs 

The weight function W(||) used in calculating the moments of Equation ([^ must satisfy two 
requirements; first, it must vanish at any k where T{k) = 0, in order to keep measurement errors 
finite; and it must have two continuous derivatives in order for the shear derivatives of the template 
moments to be calculable (see Appendix]^. With these conditions satisfied, BED is well-defined 
and unbiased, but further refinement of W can optimize the noise on the inferred g and the required 
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size of “postage stamp” of pixels for the DFT around each galaxy. In our validation tests we use 
this “/cfj” weight function: 


W{\k^\) 


2N J 


N 


0 


k<^ 

k>^ 

— a 


(57) 


with = 4. This closely approximates a Gaussian with width (in k space) of l/ci, but goes smoothly 
to zero at finite k. 


In our validation tests we assume we have a noiseless, Nyquist-sampled postage stamp of the 
PSF from which we can measure T{k) on a discrete grid of k. If we require T at other values of k, 
we interpolate the prescription for zero-padding in real space and quintic polynomial interpolation 
in fe-space given by Bernstein &: Gruen (2014). This need arises if there is distortion across the 
image such that either targets or templates are sampled at slightly different pitch than the PSF. 


4. Validation 


To verify that our implementation of BFD can infer shear with an accuracy of \m\ < 10“^, 
we use two types of simulated data. The “Gauss tests” use Gaussian galaxies, a (i-function PSF, 
and a Gaussian VF(|fc^|), in which case we can calculate all moments and their shear derivatives 
analytically—no rendering of images is done, so this is fast and bypasses any issues related to 
image discreteness. The second validation test uses simulated galaxy images produced with the 
Python/G+-|- software GalSim (Rowe et al. 2015),^ 


Table gives the parameters of the two validation simulations. While they use different 
methods to generate “observed” moments for the target and template galaxies, they use the same 
integration code. Both simulations proceed as follows: 


1. A common galaxy generator is used to generate target and template samples, with shear and 
noise being applied only to the targets. The galaxies are sampled from a uniform distribution 
in S/N (Gauss test) or flux (GalSim test) between specified limits. The galaxy half-light 
radius r^Q is also drawn uniformly between two bounds. The (unlensed) ellipticity e = (a^ — 
lP‘)/{o?‘ + b‘^) of the source is drawn from the distribution 

P{e) oc e(I — e^)^ exp (58) 

and the galaxy position angle is distributed uniformly. Galaxy origins are randomized with 
respect to the pixel boundaries (if any). 


® https: / /git hub .com/ GalSim-developers/GalSim 
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2. A “batch” of measurements is made by generating A^batch target galaxies with a constant 
shear g, adding noise, and measuring moments about the origin which yields X = 0. Those 
passing any selection cuts are integrated against ^template template galaxies drawn from the 
same generator, each of which is translated, rotated, and reflected as described above. The 
Ptot, Qtot) Rtot for the batch are saved. 


3. Batches are processed until we have generated the desired number Nt of target galaxies. Note 
that each batch draws an independent set of templates. The hnal shear estimate and its 
uncertainty are derived from the summed P,Q,Ii using Equations (19) and (18). 


4.1. Gauss tests 


We use the analytic moments of the Gauss tests to check the BED formulae and their imple¬ 
mentation, and explore the sampling parameters of the integration algorithm. Table describes 
the baseline simulation; in Section we investigate dependence of shear bias on these parameters 
using the Gauss tests. Although the moment calculations are analytic, we use the full k-d tree 
implementation described in Section 3.2 to evaluate the integrals. We can quickly run a sufficient 
number of statistics to reach the accuracy of m ~ 10“^ using these analytic simulations. 


Galaxy moments (and their shear derivatives) are calculated analytically, and the mo¬ 
ment noise AT” is generated from the multivariate Gaussian distribution with the known Cm- A 
complication is that the moment noise is held fixed as we shift the target coordinate origin to null 
the X moments. This is contrary to the behavior of normal images, and results in some changes 
to the formulae for P{D\g) which are described in AppendixThe baseline Gauss test with 10® 
targets yields m = (-1-0.1 ± 0.4) x 10“^. 


4.2. GalSim tests 

The GalSim tests validate several aspects of the code that are not exercised in the Gauss 
tests, primarily the measurement of moments and PSFs from pixelized images. The GalSim code 
is used to produce FITS images, each consisting of 100 x 100 postage stamps that are 48 x 48 pixels 
in size. Every stamp contains one galaxy located near its center. Each galaxy is the sum of an 
exponential disk and a deVaucouleurs bulge. Both components are given the same ellipticity and 
half-light radius. The fraction of flux in the bulge component is uniformly distributed between 0 
and 1. The center of the bulge is randomly shifted with respect to the center of the disk by a 
distance up to the half-light radius. For target galaxies, we apply a leasing shear g. We convolve 
the hnal galaxy with an elliptical Moffat PSF. If the galaxies are being used as targets, Gaussian 
noise is applied to the hnal stamp image. A selection of targets and templates is shown in Figure 

The range of hux assigned to galaxies is set such that it yields 5 < S/N < 25 for a circular 
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Table 1. Parameters and results of the baseline validation tests 


Characteristic Gauss test GalSim test 


Galaxy profile 
PSF profile 
PSF size (pixels) 

PSF ellipticity 
Weight function 
Weight size 

Galaxy radius^ 

Galaxy S/N 
cJe, galaxy shape noise 
Selection cuts 

^^batch / ^template, target/templates per batch 
o'max / Cstep) template truncation/replication 
-^sample; templates subsampled 

Nt, total targets 
Selection fraction 
Qtrne^ input shear 

(fl'meas fi^true) ^ 

Non-linearity a 


Gaussian 

Decentered disk-l-bulge 

(5-function 

Moffat, j3 = 

= 3.5 


^50 = 1-5 



(0.00,0.05) 


Gaussian 

ka, eqn. ( 

57 

) 

(7 = 1 

a = 3.5 pix 


0.5-1.5 

1.0-2.0 


5-25 

5-25 



0.2 

0.2 



none 

8 < S/N < 20 

10® / 3 X 10^ 

5 X 10® / 2.5 X lO'^ 

5.5 / 1.0 

6.0 / 1.1 


7 X 10^ 

5 X 10^ 



109 

10^ 



1.0 

0.69 



(0.01,0.00) 

(0.02,0.00) 



(-F0.1, -FO.O) ± (0.4,0.4) (-^4.3, -1.3) ± (0.9, 0.9) 

2 0.5 


^Galaxy half-light radius is given relative to the weight scale for Gauss tests, or relative to the PSF 
half-light radius for GalSim tests. 
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Fig. 1.— A sample of the target (left) and template (right) simulated galaxies used in the validation 
test. Targets are marked here with an X in the upper left of their stamp if they were cut for low 
(red) or high (blue) S/N. 


galaxy of typical size under matched-aperture detection. In measuring shear, we set selection 
bounds /min = 8(7/,/max = 20 ( 7 /. Note that the selection uses a different definition of S/N than 
the generation. At fixed flux, the selection favors more compact and more circular galaxies. 

The properties for these simulated galaxies were chosen to capture the non-idealities of real 
data which might affect the BFD implementation: 

• We give the PSF an ellipticity 62 = 0.05, which will test our ability to reject PSF asymmetries. 

• The Moffat PSF is not strictly band-limited so the data are slightly aliased. The PSF half- 
light radius of 1.5 pixels yields a sampling equivalent to DES imaging in seeing with FWHM 
of Cf.^ 8 , which would be in the worst-sampled quartile of the data. 

• The decentering of the disk and bulge components breaks the perfect elliptical symmetry of 
the galaxies, which might otherwise be canceling some systematic error in the method. 

• Elliptical Gaussians are a six-parameter family, and hence a given point in the 6 d (Af^, X^) 
space has only a single possible value for the shear derivatives. The varying bulge fraction 
and bulge/disk misregistration in the GalSim simulations admit a range of shear derivatives 
at each point in moment space. 
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• These tests include a non-trivial selection function and hence test the validity of the BFD 
terms for non-selection. 


We produce a total of Nt = 8.6 x 10® targets, of which a fraction 0.69105 pass the flux selection 
test. The calculated P{s) from Equation (40) predicts this extremely well: 0.69111 ±0.00006. The 
uncertainty on P{s) arises from sampling noise in the template set. 


Most importantly, the inferred values for gi and g 2 imply 

(59) 

(60) 


m = (±2.1 ±0.4) X 10“®, 
c= (-1.3 ±0.9) X 10"^ 


We detect a deviation from m = 0 at 5(T signihcance: well below that demonstrated by any 
previous practical method. The c value is within 1.5a of zero, and suppresses the input PSF 
ellipticity by a factor of > 3000. 


If we omit the selection terms in Equations (45) and (46), we obtain m = —0.0122±0.0004. The 
selection term is clearly necessary for part-per-thousand shear inference, and the BFD formalism 
appears to calculate the correction to 20% accuracy or better. 


Lastly we can assess the accuracy of the code’s internal estimates of the uncertainty on the 
shear estimator. The standard deviation of the g components derived from each batch of targets is 
(3.56 ±0.04) X 10““^, consistent with the internal error estimate from Equation (18) of 3.58 x 10“^. 


5. Testing approximations 

We collect here all the assumptions and approximations that have been made in deriving the 
lensing inference formulae: 

1. We have implicitly assumed that we know I{k) at all values of k with non-vanishing T{k), in 
other words that we have a Nyquist-sampled real-space image. 

2. The pixel noise D” is stationary and independent of the underlying galaxy G, and the moment 
noise likelihood is a multivariate Gaussian. 

3. The Jacobian determinant J = \dX/dxQ\ is positive at any location where there is non- 
negligible probability of selection 

4. Galaxies are uncrowded, in that no other galaxies contribute significantly to M or X at any 
location x where galaxy G might be selected. 

5. The lensing is weak, so that a second-order Taylor expansion about g = 0 captures all 
information about P{M\g). 
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6. Our template set G is a complete sample of source galaxies. 

7. We have a noiseless, unlensed image of each template. 

In this Section we will describe our progress to date in verifying that failures of these assumptions 
or approximations will not stand in the way of achieving part-per-thousand inference of g. Also: 
our GalSim simulation results, while very good, are still imperfect, with m measured 5a deviant 
from zero. So we are interested in whether any of these approximations could be responsible for 
this deviation. 


5.1. Nyquist sampling 


We have implicitly assumed Nyquist sampling of the data by defining our moments as integrals 
over the regions of k space with non-vanishing W(|/c^|). We will not in this paper examine the 
consequences of aliasing in the data due to hnite sampling. We do note, however, that the method 
does not require that the data be available at all k or even that it be free of aliasing. We can define 
our M elements to be sums over a finite sampling of k space, and the formalism remains valid as 
long as we know what the template galaxies’ would be under the same sampling, and also 
know the hrst two derivatives of with respect to lensing distortion g. This is true even in the 
case of aliasing, as long as the templates are aliased in the same way as the targets. This, however, 
is hard to arrange in practice, and it is better to construct un-aliased data from dithered images if 


necessary, as described in a simple case by Lauer (1999) and in a more general case by Rowe et al. 


Note also that the Moffat PSF used in the validation tests of Section is not strictly band- 
limited, so these tests incurred a level of aliasing that would be typical for a well-designed ground- 
based survey. In future tests we will evaluate whether this aliasing, or some other approximation 
in the GalSim rendering, is causing the non-zero m value in the GalSim tests. 


5.2. Stationary noise 

The assumption of stationary, source-independent noise is valid for background-limited (or 
read-noise-limited) imaging, which will generally be the case for the galaxies dominating the lensing 
information in ground-based weak-lensing surveys. We leave for future work the investigation of the 
impact of shot noise from source photons, which may be relevant for low-background space-based 
surveys. Our current simulations do not include source shot noise. 
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5.3. Convex galaxies 

The assumption of positive Jacobian determinant J for all selectable regions was necessary to 
render as feasible the analytic integration of selection probability, and also to avoid calculating the 
joint probability of multiple detections of the same source. We consider two potential modes of 
failure of this approximation. 

First, if the galaxy is sufficiently faint, is small enough even near its peak that the noise 
in M can flip the sign and create a fold in the x ^ X mapping. Clearly the defense against this 
is to have the selection threshold /min be large enough (> 5 cj/) that is also large enough to 
dominate the noise fluctuations. Further work is needed to determine if there is a level of /min 
which satisfactorily suppresses noise detections without discarding sources that carry a significant 
fraction of the lensing information. 

Second, there will be galaxies which have high flux but have complex structure such that 
crosses or approaches zero because of multiple maxima or plateaus. We should note that we only 
care about the structure in the galaxy after it has been smoothed by the detection filter, which 
in real space is the Fourier transform of W{\k‘^\)/T{k). We will usually aim to have W ~ T^I^, 
where I^{k) is the transform of the average observed galaxy. Thus in practice, the observed image, 
already convolved by the PSF, is convolved again with the PSF and the typical galaxy profile before 
running the detection scheme. This means that any maxima or plateaus on scales of the PSF or 
smaller are going to be erased. 

We have not yet validated BFD on sources with well-resolved structure that might lead to 
multiple selections, but our implementation includes some ameliorative measures in anticipation of 
the issue. First, in our postage-stamp tests, we can discard all but the highest-flux detection in each 
stamp. Our calculation of P{M, s\G, g) should then incorporate the probability of a higher-flux 
selection existing. Our crude version of this is to include in our sum over u only those values of u 
that have > 0 and are contiguous with global maximum for /. In other words we assume that 
the brightest detection will arise from the convex region of the (hltered) galaxy that surrounds the 
global maximum. We have not yet quantified the efficacy of this approach on realistic galaxies. 


5.4. Uncrowded galaxies 

Overlapping galaxies pose a considerable challenge for BFD (and indeed for nearly all lensing- 
measurement methods). We have strived for a formalism that makes minimal assumptions about 
the morphology of the galaxies. But galaxy image deblending depends fundamentally on having 
some prior expectations for galaxy morphology in order to partition the flux in a single pixel among 
two (or more) sources. We suspect that for mild cases of blending, one could precede the BFD 
analysis with joint model-fitting to multiple overlapping sources; and then, subtract each source 
model in turn when measuring the Fourier moments of the other. This would likely be successful 
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as long as the subtracted flux has moments that are small compared to the remainder, as our 
dependence on the correctness of the model will remain weak. At present, we will simply ignore 
galaxies which overlap to an extent that they grossly perturb each other’s moments. Crowding 
remains as a critical issue for deep ground-based surveys, where the product of typical observed 
galaxy size and desired target number density is > 0.1 (Chang et al. 2013). 


5.5. Weak lensing limit 


Any shear estimator that is analytic in the input shear and introduces no preferred direction 
on the sky should have 

(fl'meas “ 9tvue) = [m + Og'^ + 0(/)] ^true' (61) 

The coefficient a is expected to be of order unity unless dlogP/dg becomes large for some targets. 
This will occur only for galaxies whose moments are many a different from any of the templates, 
a situation we avoid by adding noise to high-S'/A^ targets. If a ~ 1, then the desired accuracy of 
< 10“^ of the shear will be lost in the second order Taylor expansion about g = 0 for g > 0.03. 
Expanding around <7 7 ^ 0 greatly complicates the calculation of the moments and their derivatives, 
making it impractical. BA14 derive third-order expressions, but these are not included in our 
present implementation and would also slow the method substantially. 


Figure shows the recovered multiplicative bias as a function of input shear in tests for non¬ 
linear behavior of both the Gauss and GalSim tests. The bias is consistent with the expected 
quadratic growth with g, with a ~ 2 and ~ 0.5 in the two case. A similar result is obtained for 
a model-fitting implementation of the BA14 by Sheldon (2014). The value of a clearly can vary 
based on the nature of the galaxies and the noise levels. The a = 0.5 nonlinearity contributes an 
apparent multiplicative error of ag'^ = 0.2 x 10“^ to our principal GalSim test results, smaller 
than the measurement error even with 10® target galaxies. But if real cosmic-shear measurements 
have a value closer to the a = 2 seen in the Gauss tests, the nonlinearity cannot be ignored: The 
cosmic-shear test is, at its most basic, a measure of the RMS dispersion ag of the point distribution 
function (PDF) of shear to z ~ 1 sources. Propagating (61) through a nearly-Gaussian PDF 
suggests that we would mis-estimate ag by a factor 1 -|- 3aag, which for ag « 0.02 and a = 2 would 
be a fractional error of 0.0024 on ag, larger than the expected statistical error for future surveys, 
and in need of a correction. Fortunately the value of a is straightforwardly assessed to the ~ 20% 
accuracy that would be needed to render nonlinearity errors negligible. 


The nonlinearity poses a potentially larger problem for regions of high shear such as galaxy 
clusters. This problem can be overcome with an iterative procedure if we are htting a model to the 
shear. We first fit the model, ignoring nonlinear shear response. In regions of not-so-weak shear, 
e.g. where the shear is ~ 0 . 1 , we can unshear the source galaxy by the predicted 0.1 shear before 
measuring its moments and calculating Pi,Q^, and Rj with the nominal second order procedure. 
This will yield a Taylor expansion of P{Di\g) of deviations from the model, which can then be 
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used to refine the model. We speculate that this procedure would recover full unbiased accuracy 
around well-measured individual clusters. 
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Fig. 2.— The recovered multiplicative bias using the second-order BFD formalism, as a function 
of input shear. The grey band shows the desired accuracy of m < 10“^. The dashed line is a fit of 
the data to the expected quadratic dependence on g, with coefficient a « 2 for the Gauss tests and 
a ~ 0.5 for the GalSim tests. 


5.6. Noiseless, unlensed templates 

We have assumed that the galaxies used in constructing the prior are noiseless and unlensed, 
whereas in real data they will be both noisy and lensed by large scale structure. We run a series of 
Gaussian simulations to evaluate the impact of each of these on shear measurement. In the first test, 
we add noise to the moments and derivatives of the template galaxies. Figure shows the bias in 
recovered shear as a function of the ratio of template noise variance to target noise variance. When 
this ratio is ~ 10%, the multiplicative bias remains < 10“^. Therefore, observations with > lOx the 
integration time of target galaxies are sufficiently high-5/A^ to use as “noiseless” template galaxies. 
Most current and future lensing surveys already include such deep observations to maximize their 
scientific value. 

In the second test, we assess the impact of using template galaxies that have already been 
sheared. Recall that we use rotated copies of all observed templates, which means that the mean 
shear on our templates is always zero, but we must ask whether non-vanishing variance of the 
shear on the templates produces biased shear inferences. We applied shear to the templates in two 
different ways: a constant shear amplitude for every template; and a shear randomly drawn from a 
zero mean Gaussian with dispersion ag. Figshows the bias as a function of the applied template 
shear. The multiplicative bias satisfies \m\ < 10“^ in both cases if the RMS template shear is 
< 0.04. The typical shear imparted by large scale structure is only half as large, thus it appears 
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feasible to use deep integrations of the real sky to produce the template set. 
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Fig. 3.— At left; the multiplicative bias in shear inferred from simulations where we add noise to 
the template galaxies. The x axis gives the noise variance on the templates relative to the noise 
variance on the target observations. At right: the multiplicative bias recovered when template 
galaxies have an applied shear that is either constant or drawn from a Gaussian of given RMS 
value. The grey band shows the desired accuracy of m < 10“^, which we see is retained when the 
templates have < 10% of the noise power of the targets, and the RMS shear on templates is < 0.04. 


5.7. Complete template set 


We approximate the integral over all possible template galaxy types and locations with a finite 
number ^template of high-^/A" galaxies, and by using a finite number of copies of each at intervals 
of cTstep in translation and rotation. Further we subsample a number Agampie of the resultant copies 
that lie within of each target. Ideally, the variance due to these approximations will be far 

below the expected noise of the targets. BA14 suggest that BFD estimates will have a bias that 
scales inversely with the number of templates. We again use the Gauss tests to see how sensitive 
the recovered shear is to Atempiate and Agampie- Figure shows that, for the Gauss tests, the bias 
is within \m\ < 10“^ when we have at least 30,000 template galaxies and Agampie ^ 70,000. The 
necessary values will in practice depend on how the galaxies are distributed in moment space, how 
noisy they are, and how we implement the added-noise strategy of Section 2^ These tests suggest, 
though, that a sample of 10^-10^ deep sky templates will suffice. This is readily attainable in all 
planned surveys. 


The GalSim tests reported in Section|^used a value of Atempiate = 2.5 x 10^. Tests with a 2.5x 
smaller value show no significant change in m, arguing against the hypothesis that our non-zero m 
value is attributable to insufficient template sampling. Figure shows that for the GalSim tests, 
we integrate over a similar number of templates (~ 40, 000) for each target. Note that in this case 
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-^samp was 50,000, so we never invoked the subsampling of accessible templates. An unrealistic 
aspect of this test is that galaxies are uniformly distributed in flux, leading to uniformity in the 
number of templates. The real sky has fewer bright galaxies and we would expect the template 
count to increase for fainter sources. 



Number of Templates 



Fig. 4.— The multiplicative bias recovered as a function of the number of initial template galaxies 
before adding rotated/translated copies (left) and as a function of Agampie (right), the number of 
subsampled templates used to evaluate the integrals for each target galaxy. The grey band shows 
the desired accuracy of |m| < 10“^. 


6. Future developments 

The BFD formalism can be straightforwardly extended beyond the basic single-image, single¬ 
plane shear-estimation implementation that we test in this paper. In this section we sketch some 
of these possibilities. 


6.1. Interferometric data 

Interferometric data is collected in Fourier domain; data for a galaxy will consist of estimates 
of I{k) (the visibilities) at a finite sampling of k values determined by the interferometer baselines. 
As noted in Section [5.1[ it is not required that we measure galaxies at all k: we can replace the 
integrals over (Pk in Equation Q with weighted sums over the visibilities. The sole requirement is 
that we be able to calculate the same sums for the template galaxies, as well as derivatives under 
lensing distortion. 
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Fig. 5.— The number of templates used in the integration of P{Di\g) for each galaxy as a 
function of its flux moment Mj in the GalSim tests. The central line is the median, the shaded 
region bounds the 10-90 percentile range. 

6.2. Multi-image analysis and multi-band data 

Equation Q defines our compressed measurement vector M as derived from a single image 
I{x) observed with a single PSF T{x). In most surveys, observations of a given target will be 
spread over multiple exposures i G {1, 2,..., A^}. As long as each individual exposure is unaliased, 
we can define the moments of the target as a weighted sum over the moments Mi measured on 
each exposure: 

M = WiMi. (62) 

i 

(and likewise for the detection moments X). It is important that the Wi be determined independent 
of the observed properties of the galaxy, so that the probability P{M\G) remains calculable. This 
linear combination of the Mi yields a zero-mean normal distribution for the moment noise, with 
covariance matrix Cm that is the sum of those for the individual exposures weighted by wf. Our 
implementation of this extension selects the weights to minimize the variance of the ellipticity 
moment M+, a process which depends only on the noise level and PSF of each exposure. 

Formally, we can choose a different weighting function iy(|fc^|) for each exposure, as long as 
we can calculate the M^ that would result for each template (and its lensing derivatives). It is, 
however, convenient to use the same W for all exposures, simplifying construction of the template 
set. If the seeing conditions of the exposures vary widely, then a single W may be far from optimum 
for some exposures; but since poor-seeing exposures carry less lensing information to begin with, 
we lose little by selecting W to optimize the use of exposures at median or better seeing. 

Because we make use of un-normalized moments, it is important that all exposures (including 
templates) be placed on a common photometric scale. 
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6.2.1. Multiple observing bands 


There is also no requirement that all exposures be taken through the same hlter. Equation (62) 


can refer to exposures in multiple filters. We must once again select weights in advance—iterative 
procedures such as weighting each galaxy according to its observed colors result in P{M\G) func¬ 
tions that are analytically intractable. Choosing fixed weights to apply to each filter is akin to 
measuring moments in a bandpass that is the weighted sum of all the filters’ bandpasses. 

Another alternative would be to dehne the moment vector M to be the concatenation of 
moment vectors from each filter. This retains more information, though at the cost of higher 
memory and computation demands due to the higher-dimensional moment space. We advocate 
a hybrid procedure, in which we retain distinct flux moments for each filter i, but retain 
only a single weighted combination of the other moments M^, M^, X ■ This is because shape 


information is generally highly degenerate between bands (Jarvis &: Jain 2008), but colors carry a 


lot of information. For example, red galaxies have a more compact shape distribution than blue at 
low redshift ( ]Bernstein fc: Jarvis 2002), so retaining color information when we compress the pixel 
data allows the BED formalism to exploit this distinction for more precise shear inference. 

Again the key requirement is that a low-noise measure of the template moments be 
available, as is the case if the templates are observed in all of the same bands as the targets. One 
can select distinct W functions for each band, as long as the targets and templates are treated 
consistently. An advantage of a hxed W across bands is that the resultant flux moments then have 
the same pre-seeing window function on the galaxy in all bands. This property, also attainable 


with PSF-matching codes, or the GaaP algorithm of Kuijken (2008), is desirable for use with 


photometric redshifts, since it insures that the measured colors correspond to a fixed weighting of 
the stellar populations in the galaxy, i.e. we are not mixing aperture effects with stellar evolution. 


6.3. Star-galaxy discrimination 

Stars are Dirac 5 functions in real space, so their moments are known functions of flux and 
position xq. Furthermore they are unaffected by cosmological-scale leasing so we set the leasing 
derivatives of to zero. If we add stellar sources to our template set, assigning them pc values 
according to a prior expected sky density vs flux, then we automatically correct the shear estimate 
for dilution by stellar sources. We obtain as a by-product an excellent posterior estimate of the 
probability that each source is stellar, and we can sum these to obtain a posterior stellar density 
estimate which may help to refine the stellar model that led to the prior. 

It should also be noted that faint galaxy targets which might be confused with stars are by 
definition weakly resolved, and contribute very little to Qtot Rtot for shear estimation. Hence 
the shear estimation will have low sensitivity to mis-estimation of the stellar density in the prior. 

If we have observed in multiple filters and retained flux moments Mj^i in each band as described 
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in Section 6.2.1, we can (and must) produce stellar templates across the color-magnitude diagram, 
with pg values expressing the expected density vs color and magnitude. 


6.4. Magnification 


The BFD formalism makes no assumptions about the nature of the lensing distortion vector 
g, except that we can simulate its action on each template, and that P{D\g) is well approximated 
by a quadratic Taylor expansion. This means that we can include magnification p along with the 
shear components gi and 52 with essentially no change except to increment the dimension of the 
Q and R derivatives. 


Huff & Graves (2014) note that early-type galaxies define a narrow plane in the space of flux, 
size, and concentration, which then enables much-enhanced determination of magnification. Our 
current BFD implementation would not exploit this gain since our compressed data vector lacks 
information on concentration. This could be remedied by adding a |/c^| moment to M. Furthermore 
we would need color information, i.e. the series of flux moments proposed in Section 6.2.1, to 
distinguish red galaxies in a desired redshift range. There is no need to “teach” BFD about the 
existence of the Huff-Graves relation. Any such relation that exists will be automatically exploited 
in the lensing constraints, as long as the action of lensing produces a shift in the way the galaxies 
populate the moment space. 


Two minor technical points about the estimation of magnification: first. Equation (53) has 
assumed that galaxies are placed by a Poisson process. Glustering of sources will mean that 
the resultant posterior is invalid, underestimating the uncertainty on g. We can, if desired, treat 
the unlensed density n as a spatial variable when inferring magnification statistics, to distinguish 
clustering from magnification. Second, recall that magnification will dilute the source population 
on the sky, changing the apparent n. This effect is already included in our implementation because 
we calculate the derivative dM^/dg by magnifying about the coordinate origin, not the center of 
the galaxy. This means that our grid u of template copies is dilated by magnification, but we do 
not alter A^n in Equation (38) or Equation (40). Thus source dilation is included in the P terms, 
and the n term should retain the unlensed density. 


Higher-order lensing distortions, i.e. flexions, can similarly be constrained by the BED method, 
again as long as we augment the compressed data vector to include quantities that are altered at 
first order by the distortion. Since flexion is not an affine transformation, its action on the Fourier 
domain I{k) is less easily expressed than shear and magnification. Nonetheless, it is possible 
to derive flexion derivatives of template moments for simultaneous constraint of all these lensing 
distortions using BFD. 
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6.5. Lensing tomography and photometric redshifts 


One important caveat to BFD is that one cannot select subsets of the targets and then combine 
their Pi, values to estimate the shear on this subset. This would invalidate the P{D\g) 

formulae we have derived, unless one can guarantee that the post hoc selection criteria do not at 
all alter the distribution of underlying moments of the selected galaxies. 


Many useful scientihc inferences and diagnostic tests for weak lensing measurements rely upon 
comparing g on subpopulations of the sources. Most critically, the bulk of the lensing informa¬ 
tion, plus constraints on contamination by intrinsic galaxy alignments, require splitting the source 
population into redshift bins, a.k.a. lensing tomography. 


If precise redshift estimates are available for all targets and templates, then the application of 
BFD is straightforward, as we compare each target only to templates that reside in the same redshift 
bin. More commonly we have probabilistic redshift estimates for targets derived from photometric 
redshift (photo-z) estimation. Partitioning target or template galaxies by their maximum-likelihood 
redshifts will not, in general, yield valid BFD inferences on the shear in each redshift binj^ 


This apparent stumbling block turns out to be an opportunity: the BFD formalism contains 
within it an ideal Bayesian photo-z estimation mechanism, particularly for the sources with modest 
S/N < 30 photometry that dominate the weak lensing information in most surveys. Benitez (2000) 
presents the formalism for Bayesian inference of redshift from broad-band fluxes; like BFD, it relies 
upon having noiseless data vectors for a sample of “truth” objects of known prevalence on the night 
sky. 


We generalize the BFD method as follows: the lensing vector is extended to the tomographic 
information t = where g^^ is the lensing distortion applied to sources in redshift bin 

u out of Z total bins. We want the posterior P{t\D), and as before we compress the image data 
D into the moments Mi of objects detected and selected at positions Xi. The posterior on t is 
calculable once we have expressions for P{Mi, s\t) and the total selection probability P(s|t). If 
we know the probability pcu that template galaxy G is in redshift bin v, then we have the clear 


generalization of Equation (38) to the tomographic case: 


P{M,s\t)= ^ pGA^u'^pGuP{M,s\G,u,g,,) 
G,Xc 

P{M,s\G,u,g,) = |J(M)| C [X^{u,g,)] C[M - M^{u,g, 


(63) 

(64) 


In the second line we make explicit the dependence of the template moments on its true position 
u relative to the detection location and upon the shear g,^ to the source. The total probability of 
detection vs t is similarly obtained by introducing pGu into Equation (40). 


®The same is true for most other lensing-inference methods: the responsivities or empirical calibrations they 
employ will depend upon the galaxy selection in subtle ways that may thwart part-per-thousand calibration. 








33 - 


We can also easily calculate the posterior redshift distribution P(z^|iWj) for each source, which 


would be found equivalent to the treatment of Benitez (2000). Of course our redshift discrimination 
will be weak unless we have measured flux moments Mfj in multiple bands j as described in 


Section 6.2.1 where we noted that our pre-seeing aperture-matched fluxes are ideal for photo-z 
purposes. We also note that we are working with fluxes, not colors, and therefore we automatically 
include the “luminosity prior” that is often added by hand into photo-z estimators. Indeed our 
inclusion of Mr,M+, and Mx means that we automatically exploit any size, surface brightness, or 
ellipticity information that helps with redshift discrimination. 

Extending BED to return the full tomographic lensing likelihood P{t\D) would have many 
advantages for precision lensing cosmology. It would allow us to extract all the available lensing 
information from galaxies with low-resolution photo-z information due either to color ambiguities 
or low S/N. It eliminates the need for post hoc estimation of selection biases induced on the lensing 
estimators by photo-z cuts. An important issue will be whether it is feasible to use sufficiently 
many, narrow bins that we do not need to worry about the variation of lensing signal across the 
redshift range of a bin. 

The implementation of BED tomography requires that we have a template galaxy set with 
known redshift probabilities assigned to each. Clearly one issue is how to obtain this information— 
especially since the number of templates required to sample the moment space with desired density 
will increase substantially with additional flux dimensions in M. It is likely infeasible to obtain 
spectroscopic redshifts for a sufficiently large and complete set of templates. A survey such as 
DES which observes galaxies in the grizY hands requires higher-S/A^ observations in these bands 
to create the template moments set; these observations in the survey bands could be supplemented 
with deep data in other bands and with other instruments to tighten the pcv estimates for the 
template set—for example, the COSMOS field has data in many bands across the EM spectrum, 


producing much higher-reliability photo-z’s beyond the spectroscopic limit (Ilbert et al. 2009). It 


will likely be necessary to use spectral-synthesis methods to create artificially redshifted copies of 
the observed template galaxies, just as we synthesize for rotated copies, in order to more 
densely sample the template space and damp the line-of-sight structures (sample variance) present 
in the template fields. We envision BED primarily as a means to rigorously bootstrap the photo-z 
calibration from a well-observed subset of galaxies to the full survey population. 

Incompleteness in the spectroscopic surveys defining the redshift priors is a difficult problem 
(see e.g. Bernstein Sz Huterer 2010| . The BED tomography formalism allows us to propagate 
the uncertainties due to missed redshifts into the final cosmological results: we can reassign the 
probabilities pcu using different assumptions about the missing redshifts, and propagate these 
cases through P{t,D) into cosmological inferences to determine their impact. One could also add 
a “mystery bin” to t to which we assign all template galaxies with poorly known redshift. The 
tomographic BED formalism will calculate the likelihood that any given target has unknown z, and 
cosmological inferences could marginalize over the redshift distribution of the mystery bin. 
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Joint BFD tomography and photo-z is clearly an intriguing and critical extension of the 
method, with quite a few details to work out. We will examine these in future publications. 


7. Conclusion 


The BFD method is now a practical, validated means to estimate WL shear at parts-per- 
thousand accuracy. In our initial large-scale tests, deviations from perfection were measurable 
only with trials of nearly 10® simulated galaxies. Work remains to determine if and why BFD 
has inaccuracies at the level of m = 0.002. Future work will investigate the possible impact of 
aliasing, and approximations in the rendering of images onto finite postage stamps, since we find 
m consistent with zero for our Gauss tests that do no rendering. PSF asymmetries are perfectly 
removed from the shear estimator, to present accuracy. We have implemented a flux selection in 
such a way that we can correct the > 1% selection bias induced on the shear. The BFD formalism 
needs no parameter tuning or calibration to eliminate biases—there is just a free weighting function 
one chooses to minimize noise. A real implementation does have some parameters for sampling the 
infinite distribution of galaxies on the sky, which imply a tradeoff of bias vs observational and 
computational resources. We have used simulations to validate the performance of the method and 
demonstrate that the desired accuracy is attainable with readily available resources to sample the 
underlying galaxy population. 

While BFD assumes that noiseless images of unlensed galaxy populations are available, our 
tests indicate that it retains desired accuracy when the templates are taken from images with 
the same instrument, but ~ lOx longer exposure time than the target survey. This is eminently 
practical, and indeed most planned surveys already have such “deep fields” for other reasons. 


The BFD method also predicts the uncertainty on the shear estimate, and the detection ef¬ 
ficiency, correctly to within the shot noise of our tests. The algorithms should scale to the needs 
of even the largest proposed surveys, and the computational steps are simple, highly parallel and 
amenable to execution on GPU’s if greater speed is needed. 


Sheldon (|2014|) reports \m\ < 2 x 10 ^ when applying the BA14 formalism to likelihoods 


derived from MCMC model-fitting to galaxy images with S/N as low as 10. The galaxy images 
were both drawn and fitted with simple Sersic models, so this work notes that the method may 
be susceptible to “model bias” in more realistic cases. Sheldon (2014) also does not yet include a 
prescription for galaxy selection and resultant biases. 


The only other demonstration of part-per-thousand WL inference at S/N < 10 from a realistic 
algorithm of which we are aware is Zhang et al. (2015), also implemented as the FourierQuad 
method in the GREATS challenge ( Mandelbaum et al.||2015 ). This method shares several character¬ 
istics with BFD: galaxies are reduced to weighted moments in Fourier space, where PSF correction 
is straightforward. Neither method assigns shapes to individual galaxies; FourierQuad works 
by stacking un-normalized moments of the power spectrum; the shear estimator is a quotient of 
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stacks. Using the power spectrum has the advantage of making the estimator insensitive to choice 
of galaxy origin, but amplifies measurement noise by relative to our phase-sensitive moments. 
More problematic is that a stacking method weights galaxies by flux, which is far from optimal. 
FourierQuad does not yet have an approach to selection and weighting of sources without biasing 
shear inferences. BFD is at this time closer to applicability on real data. 


Schneider et al. (2015) propose an ambitious effort to simultaneously model the shear field, the 


pixel-level appearance of galaxies within it, and the underlying distribution of the source galaxies. 
This approach shares some formalism with BFD, but does ultimately rely on parametric models 
for the galaxies. Our “model,” which is that galaxies’ true moments are equal to those of galaxies 
found in a deep sub-survey, should be less subject to model bias than the Schneider et al. (2015) 
approach while greatly reducing the computational complexity. 


There are issues to address before BFD can be applied to real survey data. Working in Fourier 
space means we cannot easily exclude pixel data contaminated by cosmic rays or defects, and 
hence we need some method for infill of pixels or rejection of exposures. Overlapping or multi- 
peaked galaxy images are not handled by BFD, so we will need some combination of model-based 
deblending with rejection of hopeless overlaps that does not significantly bias WL inferences. This 
will be easier in low-density surveys such as DES and KiDS than deep ground-based surveys such 
as LSST and HSC. For space-based surveys, we need to investigate the behavior of BFD in the 
presence of source shot noise that violates our background-limited (stationary) noise assumption. 
We also may need to develop a nonlinearity correction for some applications. 


Our validation tests assume constant shear across all galaxies, but as BA14 point out, it 
is straightforward to calculate a posterior likelihood on the parameters of any model of shear vs 
position, for example for tangential shear vs radius around a selected lens population. Cosmological 
models, however, predict a power spectrum or other statistical property of the WL field rather than 
predicting the shear pattern itself. Current 2-point (and 3-point) estimators for shear assume that 
each source galaxy provides a point estimate of the shear, but BFD returns a different kind of 
information, namely some weak probability distribution for shear along each line of sight in the 
form of {Pi, Qi, R*}. Exploitation of the BFD outputs for lensing statistics will require development 
of new estimation frameworks. Madhavacheril et al. (2015) discuss means to treat such outputs as 
point estimators, and quadratic estimators for 2-point functions that use BFD-style information. 


We have also treated the lensing distortion as a single screen, whereas the sources are dis¬ 
tributed in z and hence we measure a weighted mean of shear on the line of sight. A real experiment 
will need to estimate the z distribution of sources—or more precisely, the distribution of contribu¬ 
tion to the BFD shear estimate. Better yet, we have outlined an extension of BFD to joint Bayesian 
redshift and shear estimation, which directly generates a tomographic lensing likelihood P{Di\t) for 
each source where t contains the shear (and potentially magnification and source density) at a series 
of z bins. This could open the door to full exploitation of the low-to-modest S/N regime—where 
both photo-z and WL estimators have proven difficult to produce without bias—that potentially 
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carries more information than high-S/N galaxies with well-constrained photo-z’s. Work is needed 
to develop statistics to constrain cosmological models with this P{Di\t) information, as opposed 


to the binned point estimates used now. It is likely that there are extensions of the Madhavacheril 


et al. (2015) techniques to this tomographic case. 


A critical question will be how many template galaxies must be observed, particularly in the 
tomographic case where we will need to increase the dimensionality of the moment space that the 
templates sample. This is related to the question of how large and complete a spectroscopic sample 
is needed to calibrate photo-z’s to the accuracy needed for WL cosmology. 


The BFD method also naturally extends to multi-filter or interferometric observations, and 
deals gracefully with the blurring of the stellar and galactic loci in faint surveys. Compared to 
currently dominant model-fitting methods for WL inference, BFD has some disadvantages, such 
as not-quite-optimal use of the pixel information, annoyances with defective pixels, and a less- 
clear route to using crowded sources. BFD’s advantages are, however, substantial, primarily in the 
superior accuracy that comes from having a first-principles treatment of noise and selection, and 
no need to assume a functional form for the sources. 
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NNX11AI25G from NASA, and cooperative agreement with JPL under NASA grant ROSES-12- 
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A. Probabilities for augmented moment noise 


Section 


2.5 


describes a strategy of adding noise with covariance matrix Ca to the moments 


M of a selected target galaxy. We need to know the probability of selecting the galaxy and 
obtaining the total moments Ai = M + Ma- Recall that the originally measured moments can 
also be expressed as M = + Af", where the noise moments have known covariance matrix 

Cm- We assume that both TW” and are drawn from zero-mean multivariate Gaussians. 


The galaxy is detected by the criterion X = + X^ = 0 and selected according to /i < 

Alf < / 2 , and we will use the notation M G S' to denote when this condition is satisfied. We want 
the quantity 


P{M,s\G) = C{X^) [ dM \J{M)\P{M,M\G) 
JMeS 


(Al) 


= C{X 


G\ 




dM^ {J^ + 2M^ • B • M” -h M*" • B • M") C{M - , M"). 

(A2) 


Recall that we are approximating that the Jacobian derivative J = \dX/dxQ\ = M ■ B • M is 
positive wherever the likelihood is non-negligible. Also note that in this Appendix we will suppress 
the dependence of moments and other quantities on the lensing g and the displacement u. 

The joint distribution of the final and initial noise C{A4 — , Af") is a zero-mean normal 

distribution. The covariance matrix of the concatenated noise vectors is known and fully specifies 
the distribution: 


Cov(M) = Ca + Cm ^ C 
Cov(M) = Cm 
C ov(M,M) = Cm- 


Equation ( |A2[ ) can be integrated over the Gaussian distribution; the result is 

-T 


P(M,slG) = £(X^)A'^x |27rCr'/^exp 


- 1 / 2 , 


1 


— (M- M^) C-^ (M - M^) 


X 


{y [j(M) + Tr (BCaC-^Cm) 


- 2y'Z'^BM -h y"Z'^BZ 


}. 


Af = CmC-^M + CaC-^M^, 

1 

.'W//, 


Z = —C.C-^C, 




a}={CAC-^CM)ff- 


(A3) 

(A4) 

(A5) 


(A6) 


(A7) 

(A8) 

(A9) 


Here Cm / is row / of the original moment covariance matrix. The bounded Gaussian integral over 
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/ results in the terms 


F = Y{v, 


mm? ^max } 


Y' 


Y" 


^min 


^max 




dY dY 


^Ujiiax 

dUmin 

d‘^Y 

d^Y 

^ '^max 

^^min 

1 / 


( /min 

-Mf) 

(T/ V 

1 / 

~ \ 

( /max 

-Mf) 

(T/ V 


(AlO) 

(All) 

(A12) 

(A13) 

(A14) 


We replace Equation (38) with a weighted sum of (A6) over template galaxies G and their potential 
displacements xq. As before, derivatives with respect to g propagate through the expressions 
into derivatives of the template moments. The total selection probability from Equation (40) is 
unaltered, since is not added to the moments until the selection process is complete. 


B. Probabilities with translation-invariant noise vector 


In some of our validation tests, we create moment vectors M for targets by calculating 
directly from analytic formulae rather than pixelated images, and adding moment noise drawn 
from its known distribution. These simulated targets differ from image-based simulations in that 
the moment noise realization is invariant under shift of the coordinate origin xq. In this case the 
Jacobian J = dX/dxo has contributions only from the underlying galaxy, not from the noise. We 
therefore must alter our formulae in this case of translation-invariant moment noise realizations. 


Equations (38) and (40) are altered by substituting these equations for the moment probability and 


the total selection probability of each galaxy: 


P(M, s|G) = C (X^) J (M^) C{M - M^) 

P{s\G) = C {X^) J (M^) y. 


(Bl) 

(B2) 


where Y is defined in Equation (34), and we suppress dependence on g and u. 


In the case where we have added noise to the moments after selection in order to better smooth 


the template samples, we replace the formula (A6) with the simpler 
P(AI,s|G) =£{X^)A^x |27rCr^/^exp 


(M - (M - M^) 


(B3) 
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Derivatives and transformations of the Fourier-domain moments 


Implementation of the BFD method requires that we calculate the derivatives of the Fourier- 
domain moments of our template galaxies under lensing distortions. We summarize here the formu¬ 
lae for these derivatives in the case of shear. We give the formulae in the case where the observed 
template surface brightness I{x) is a continuous function. The transition to finite sampled data is 
straightforward. 


Our convention for the Fourier transform of the image is 

I{k)= [ d‘^xl{x)exp{-ik-x). 


(Cl) 


We are interested in the change in moments after the image undergoes an affine transformation 


l'{x) = I (a ^x — Xq) 


Standard Fourier manipulations give 


i'{k) = 
k' = A^k. 


(C2) 


(C3) 

(C4) 


We define a two-component shear g = ( 51 , 52 ) of a galaxy image with the flux-conserving transfor¬ 
mation 


A-i = 


1 




1-51 -92 

-92 1+51 


(C5) 


Note the BFD method is agnostic about the dehnition of shear; this is simply the choice for our 
implementation. 


It is convenient to adopt a complex notation at this point: 

k = kx + iky 

9^91+ i92 

With this notation the action of shear k 

\ / 

k ^ A:' = (1 - 55)“'/' {k - 9k) • 


2 V95i *552 
2 V551 *552 


fA^) ^ k becomes 


(C6) 


(C7) 


The moments we are interested in can also be compactly expressed in complex notation as 


well: 

M'^ = j (fki{k)W{k'k')Fa{k'), 


(C8) 
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with 


Mq = Mf 
Ml = Xx + iXy 
M2 = M+ + zMx 
Mr 

The shear derivative operators can be rewritten as 
Vg = vd + vd 

VgVg = vv^d"^ + vv'^d'^ + 2l2<99 


To = 1 
Fi = ik 
F2 = k^ 

Fr = kk. (C9) 



(CIO) 



(Cll) 


Now the derivatives of the moments with respect to shear are obtained by applying these operators 
to the moment definition (C8) after substituting in the shear wavevector transformation (C7). For 
each moment, the derivatives can be expressed as 


VgMc, = j (fki{k) [W{kk)A^{k) + W'{kk)B^{k)\ (C12) 

VgVgMr, = j (fki{k) [W{kk)Ca{k) + W'{kk)Dr,{k) + W"{kk)Er,{k)\ . (C13) 

Table lists the functions A, B, C, D, E, and F that yield the moments and their derivatives under 
shear. All of the moments and their derivatives are simple weighted polynomial moments of the 
galaxy Fourier transform. 

A translation of the galaxy by xq adds a factor /(fc) in the integrand of all the 

moments (and their derivatives). 
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Table 2. Functional forms of the integrands for moments and their derivatives, as defined by 
Equations (C13). The derivatives of the moments under translation in x and y directions are 
found by adding factors of i{k + k)/2 and (fc — k)/2 to the entries, respectively. 


Moment 

Mo 

Ml 

M2 

Mr 

Fa = 

1 

ik 

¥ 

kk 

Aa = VX 

0 

—ik 

—2kk 

-¥ 

+VX 

0 

0 

0 

-¥ 

Ba = VX 


—ik¥ 

-¥¥ 

-k¥ 

+VX 

-A:2 

—i¥ 

-¥ 

-¥k 

X 

(M 
1—1 

II 

0 

ik 

2¥ 

Akk 

+vv'^x 

0 

0 

2P 

0 

X 

hH 

II 

4:kk 

Qi¥k 

8¥k 

8¥¥ 

+vv'^x 

0 

2i¥ 

Ak¥ 

2¥ 

+vv'^x 

0 

0 

0 

2¥ 

X 

(M 
1—1 
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